Prospective students face many options on where to earn a graduate accounting degree. On the east coast alone, there are thousands of universities offering a graduate accounting degree. Adding to this complexity, students must also evaluate whether each university program offers skills that employers are recruiting for. Some skills are technical and deal with specific topics such as SQL, Python and statistics while others are ‘soft’ skills like team work and collaboration.
In this projects, we will show how to build a recommender system using tinymodels by first loading pre-processed college meta data and sought-after data science skills. After classifying each program as a match for having sufficient data science training, we create a sample set and build a recommender system.
The recommender system at the end will be able to categorize any other accounting program as having a good data science program or not.
# Libraries
library(tidyverse)
library(tidytext)
# For tinymodels
library(DiceDesign)
library(tidymodels)
library(workflows)
library(tune)
library(mlbench)
library(rsample)
library(recipes)
library(parsnip)
library(yardstick)
library(tm)
# For mapping
library(sf)
library(leaflet)
library(htmltools)
Websites for colleges are vastly different from one another in terms of HTML structure and website layout. For example, for some colleges, when navigating to their course descriptions page, the page itself will contain links to PDFs.
Figure 1: Course description page for Angelo State University
When accessing the course description page for other colleges, the descriptions will be on the page itself instead of on a PDF as shown on Figure 2.
Another plan that the team had in mind was to ignore the websites themselves and just parse through the course catalogue PDFs for all of the colleges with graduate accounting programs. However, we ran into a similar problem where even the PDFs themselves were vastly different from one another in terms of layout if we compare Figure 3 to Figure 4.
Figure 3: A snippet of the graduate accounting course descriptions for Angelo State University taken from the 2019-2020 graduate catalogue
Based on these caveats that the team encountered when exploring the possibility of web scraping for college course descriptions, the team decided that it would be best to just use the data that was collected from Dr. Foy’s students which was manually copy and pasted.
Building on the prior work by Team Four, we load three data frames: * Graduate Accounting Programs on the east coast * Dictionary of desired technical skills by employers * Dictionary of desired soft skills by employers
accounting_programs <- read_csv("https://github.com/cliftonleesps/607_final_project/blob/master/Acct_Curricula2.csv?raw=true", show_col_types = FALSE, )
technical_skills <- read_csv("https://github.com/cliftonleesps/607_final_project/raw/master/technical_skills.csv", show_col_types = FALSE)
soft_skills <- read_csv("https://github.com/cliftonleesps/607_final_project/raw/master/soft_skills.csv", show_col_types = FALSE)
# Geocoding Schools from Kratika Patel
library(sf)
library(tidyverse)
url1 <- "https://raw.githubusercontent.com/cliftonleesps/607_final_project/master/Acct_Curricula2.csv"
AcctCurricula <- data.frame(read.csv(url1))
col <- colnames(AcctCurricula)
col <- toupper(col)
col[1] <- "NAME"
colnames(AcctCurricula) <- col
Names <- AcctCurricula %>% select("NAME")
Names <- data.frame(NAME = unique(Names$NAME))
url2 <- "https://raw.githubusercontent.com/cliftonleesps/607_final_project/master/EDGE_GEOCODE_POSTSECSCH_2021.csv"
schools <- data.frame(read.csv(url2))
col <- colnames(schools)
col[1] <- "UNITID"
colnames(schools) <- col
#head(schools)
SchoolGeo <- schools %>%
filter(NAME %in% Names$NAME)
#Correct typos and clean names of Universities not detected in schools dataframe
Names %>%
filter(!(NAME %in% schools$NAME))
## NAME
## 1 Fitchberg State University
## 2 Pennsylvania State University
## 3 Saint Joseph's University\n
## 4 Strayer University - Delaware
## 5 Strayer University-North Carolina (online, for-profit)
## 6 University of Massachussetts - Amherst
## 7 University of Massachussetts - Dartmouth
## 8 University of North Carolina Chapel Hill
Names$NAME[Names$NAME == "Fitchberg State University"] <- "Fitchburg State University"
## Warning in `[<-.factor`(`*tmp*`, Names$NAME == "Fitchberg State University", :
## invalid factor level, NA generated
Names$NAME[Names$NAME == "Saint Joseph's University\n"] <- "Saint Joseph's University"
## Warning in `[<-.factor`(`*tmp*`, Names$NAME == "Saint Joseph's University\n", :
## invalid factor level, NA generated
Names$NAME[Names$NAME == "Pennsylvania State University"] <- "Pennsylvania State University-Penn State Harrisburg"
## Warning in `[<-.factor`(`*tmp*`, Names$NAME == "Pennsylvania State
## University", : invalid factor level, NA generated
Names$NAME[Names$NAME == "Strayer University - Delaware"] <- "Strayer University-Delaware"
## Warning in `[<-.factor`(`*tmp*`, Names$NAME == "Strayer University -
## Delaware", : invalid factor level, NA generated
Names$NAME[Names$NAME == "Strayer University-North Carolina (online, for-profit)"] <- "Strayer University-North Carolina"
## Warning in `[<-.factor`(`*tmp*`, Names$NAME == "Strayer University-North
## Carolina (online, for-profit)", : invalid factor level, NA generated
Names$NAME[Names$NAME == "University of Massachussetts - Amherst"] <- "University of Massachusetts-Amherst"
## Warning in `[<-.factor`(`*tmp*`, Names$NAME == "University of Massachussetts -
## Amherst", : invalid factor level, NA generated
Names$NAME[Names$NAME == "University of Massachussetts - Dartmouth"] <- "University of Massachusetts-Dartmouth"
## Warning in `[<-.factor`(`*tmp*`, Names$NAME == "University of Massachussetts -
## Dartmouth", : invalid factor level, NA generated
Names$NAME[Names$NAME == "University of North Carolina Chapel Hill"] <- "University of North Carolina at Chapel Hill"
## Warning in `[<-.factor`(`*tmp*`, Names$NAME == "University of North Carolina
## Chapel Hill", : invalid factor level, NA generated
SchoolGeo <- schools %>%
filter(NAME %in% Names$NAME)
s <- schools %>% filter(NAME== "University of Connecticut")
SchoolGeo <- add_row(SchoolGeo, s)
SchoolGeo[39,2] <- "Fitchberg State University"
## Warning in `[<-.factor`(`*tmp*`, iseq, value = "Fitchberg State University"):
## invalid factor level, NA generated
SchoolGeo[130,2] <- "Saint Joseph's University\n"
## Warning in `[<-.factor`(`*tmp*`, iseq, value = "Saint Joseph's University\n"):
## invalid factor level, NA generated
SchoolGeo[1,2] <- "Pennsylvania State University"
## Warning in `[<-.factor`(`*tmp*`, iseq, value = "Pennsylvania State University"):
## invalid factor level, NA generated
SchoolGeo[142,2] <- "Strayer University - Delaware"
## Warning in `[<-.factor`(`*tmp*`, iseq, value = "Strayer University - Delaware"):
## invalid factor level, NA generated
SchoolGeo[143,2] <- "Strayer University-North Carolina (online, for-profit)"
## Warning in `[<-.factor`(`*tmp*`, iseq, value = "Strayer University-North
## Carolina (online, for-profit)"): invalid factor level, NA generated
SchoolGeo[41,2] <- "University of Massachussetts - Amherst"
## Warning in `[<-.factor`(`*tmp*`, iseq, value = "University of Massachussetts -
## Amherst"): invalid factor level, NA generated
SchoolGeo[45,2] <- "University of Massachussetts - Dartmouth"
## Warning in `[<-.factor`(`*tmp*`, iseq, value = "University of Massachussetts -
## Dartmouth"): invalid factor level, NA generated
SchoolGeo[113,2] <- "University of North Carolina Chapel Hill"
## Warning in `[<-.factor`(`*tmp*`, iseq, value = "University of North Carolina
## Chapel Hill"): invalid factor level, NA generated
#Remove Duplicate row for Pennsylvania State University-Penn State Harrisburg
SchoolGeo <- SchoolGeo[!(SchoolGeo$UNITID == 49576722),]
#glimpse(SchoolGeo)
# subset(SchoolGeo, NAME == "Ramapo College of New Jersey")
#
# ?inner_join
#
# t <- right_join(SchoolGeo, temp_schools, by = c("NAME"= "name"))
# subset(t, NAME == "Ramapo College of New Jersey")
The collegiate accounting in its native form requires a little tidying. Each row is an observation of a course and its curriculum description. We’ll create a vector from each description and join with a vector of technical and a vector of soft skills. If there are any matches, the match_technical_skills attribute is set from zero to one.
# initialize some counters
current_school <- accounting_programs$School[1]
description <- accounting_programs$Description[1]
# temp_schools is where we keep our tidy data
temp_schools <- tibble(
name = "",
description = "",
match_technical_skills = 0,
match_soft_skills = 0
)
# Iterate through the accounting programs
# Since a college appears on more than one row, we have to aggregate all of the course descriptions grouping
# by college name
for (row in 2:nrow(accounting_programs)) {
# if we detect a different school name, then save the data to the tibble
if (current_school != accounting_programs$School[row]) {
temp_schools <- temp_schools %>%
add_row(
name = current_school,
description = paste0(description, accounting_programs$Description[row]),
match_technical_skills = 0,
match_soft_skills = 0
)
description <- accounting_programs$Description[row]
current_school <- accounting_programs$School[row]
} else if (!is.na(accounting_programs$Description[row])) {
# Just keep pasting the description for later
description <- paste0(description, accounting_programs$Description[row])
}
}
# Add the last school to the tibble
temp_schools <- temp_schools %>%
add_row(
name = current_school,
description = paste0(description, accounting_programs$Description[row]),
match_technical_skills = 0,
match_soft_skills = 0
)
# delete the first row
nrow(temp_schools)
## [1] 151
temp_schools <- temp_schools[-1,]
nrow(temp_schools)
## [1] 150
# Function to remove duplicate words to be used in the next for loop
rem_dup_word <- function(x){
x <- tolower(x)
x <- gsub("-", " ", x)
x <- gsub("/", " ", x)
x <- gsub("[[:punct:]]", "", x)
x <- gsub("[[:digit:]]", "", x)
x <- gsub("this course", "", x)
x <- gsub("topics include", "", x)
return(paste(unique(trimws(tibble(word = unlist(strsplit(x, split = " ", fixed = F, perl = T))) %>% anti_join(stop_words) %>% pull(word))),
collapse = " "))
}
# now iterate through the schools and split the descriptions
for (count in 1:nrow(temp_schools)) {
# get the current row
ts <- temp_schools[count,]
# Obtain the school anme
school_name <- ts[1]
# Use the rem_dup_word function on the 2nd element of ts, which contains the
# course description
description_string <- rem_dup_word(ts[2])
# Make each word in the `description_string` character vector a row element
# in a `school_descriptions` dataframe.
school_descriptions <- data.frame(as.list(str_split(description_string, " ")))
# Change the column name in the `school_descriptions` dataframe
colnames(school_descriptions) <- c("word")
# now join with the technical skills
# If any words match the vector of technical skills then we
# set technical_skill_match = 1
technical_skill_match <- inner_join(technical_skills, school_descriptions,by="word")
if (nrow(technical_skill_match) > 0) {
#print (school_name)
temp_schools[count,][3] <- 1
}
# now join with the soft skills
# If any words match the vector of soft skills, we
# soft_skills_match = 1
soft_skills_match <- inner_join(soft_skills, school_descriptions,by="word")
if (nrow(soft_skills_match) > 0) {
#print (school_name)
temp_schools[count,][4] <- 1
}
}
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
# create a new column school_score = match_technical_skills + match_soft_skills
temp_schools <- temp_schools %>% mutate (school_score = match_technical_skills + match_soft_skills)
# create another new column good_data_science_program = [YES,NO]
temp_schools <- temp_schools %>% mutate (good_data_science_program = ifelse( school_score >= 2, "YES", "NO"))
# drop the description column since it takes a lot of
# memory
temp_schools <- subset(temp_schools, select = -c(2))
ncol(temp_schools)
## [1] 5
# now join with SchoolGeo so we get the latitude and longtitude
temp_schools <- right_join(SchoolGeo, temp_schools, by = c("NAME"= "name"))
#temp_schools <- inner_join(SchoolGeo, temp_schools, by = c("NAME"= "name"))
# Start building the model
set.seed(4393003)
sample_size <- 100
glimpse(temp_schools)
## Rows: 150
## Columns: 27
## $ UNITID <int> 129215, 129242, 130253, 130943, 130989, 1311…
## $ NAME <chr> "Eastern Connecticut State University", "Fai…
## $ STREET <fct> "83 Windham St", "1073 N Benson Rd", "5151 P…
## $ CITY <fct> Willimantic, Fairfield, Fairfield, Newark, W…
## $ STATE <fct> CT, CT, CT, DE, DE, DE, FL, FL, FL, FL, FL, …
## $ ZIP <fct> 06226, 06824-5195, 06825-1000, 19716, 19808,…
## $ STFIP <fct> 09, 09, 09, 10, 10, 10, 12, 12, 12, 12, 12, …
## $ CNTY <fct> 09015, 09001, 09001, 10003, 10003, 10003, 12…
## $ NMCNTY <fct> Windham County, Fairfield County, Fairfield …
## $ LOCALE <fct> 31, 21, 21, 21, 21, 21, 21, 21, 13, 21, 12, …
## $ LAT <dbl> 41.72167, 41.15767, 41.22089, 39.67958, 39.7…
## $ LON <dbl> -72.21875, -73.25590, -73.24333, -75.75282, …
## $ CBSA <fct> 49340, 14860, 14860, 37980, 37980, 37980, 33…
## $ NMCBSA <fct> "Worcester, MA-CT", "Bridgeport-Stamford-Nor…
## $ CBSATYPE <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ CSA <fct> 148, 408, 408, 428, 428, 428, 370, 422, 370,…
## $ NMCSA <fct> "Boston-Worcester-Providence, MA-RI-NH-CT", …
## $ NECTA <fct> 79300, 71950, 71950, N, N, N, N, N, N, N, N,…
## $ NMNECTA <fct> "Willimantic, CT", "Bridgeport-Stamford-Norw…
## $ CD <fct> 0902, 0904, 0904, 1000, 1000, 1000, 1224, 12…
## $ SLDL <fct> 09049, 09133, 09134, 10025, 10021, 10017, 12…
## $ SLDU <fct> 09029, 09028, 09028, 10008, 10004, 10012, 12…
## $ SCHOOLYEAR <fct> 2020-2021, 2020-2021, 2020-2021, 2020-2021, …
## $ match_technical_skills <dbl> 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,…
## $ match_soft_skills <dbl> 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1,…
## $ school_score <dbl> 1, 2, 2, 2, 2, 2, 2, 0, 1, 2, 2, 2, 2, 2, 2,…
## $ good_data_science_program <chr> "NO", "YES", "YES", "YES", "YES", "YES", "YE…
random_schools <- sample(temp_schools, size= 2, replace = FALSE)
random_schools
## CITY school_score
## 1 Willimantic 1
## 2 Fairfield 2
## 3 Fairfield 2
## 4 Newark 2
## 5 Wilmington 2
## 6 New Castle 2
## 7 Miami 2
## 8 Orlando 0
## 9 Boca Raton 1
## 10 Miami 2
## 11 Lakeland 2
## 12 Tallahassee 2
## 13 Gainesville 2
## 14 Coral Gables 2
## 15 Jacksonville 2
## 16 Tampa 2
## 17 Tampa 2
## 18 Babson Park 2
## 19 Pensacola 2
## 20 Orono 2
## 21 Standish 2
## 22 Portland 2
## 23 Waterville 2
## 24 Baltimore 2
## 25 Baltimore 0
## 26 Adelphi 2
## 27 College Park 2
## 28 Baltimore 2
## 29 Towson 2
## 30 Worcester 2
## 31 Wellesley 2
## 32 Longmeadow 2
## 33 Waltham 2
## 34 Chestnut Hill 2
## 35 Bridgewater 2
## 36 Worcester 2
## 37 Milton 2
## 38 Boston 2
## 39 Boston 2
## 40 Boston 2
## 41 Springfield 2
## 42 Bloomfield 2
## 43 Caldwell 2
## 44 Teaneck 2
## 45 Madison 2
## 46 Jersey City 2
## 47 Union 2
## 48 West Long Branch 2
## 49 Montclair 2
## 50 Mahwah 2
## 51 Newark 2
## 52 Jersey City 2
## 53 South Orange 2
## 54 Garden City 2
## 55 Alfred 1
## 56 New York 2
## 57 Brooklyn 2
## 58 Staten Island 2
## 59 New York 2
## 60 Bronx 2
## 61 Queens 2
## 62 Bronx 2
## 63 Hempstead 2
## 64 New Rochelle 2
## 65 Ithaca 2
## 66 Syracuse 2
## 67 Brookville 2
## 68 Riverdale 1
## 69 Poughkeepsie 2
## 70 Dobbs Ferry 2
## 71 Rockville Centre 2
## 72 Bronx 2
## 73 Newburgh 2
## 74 Rochester 2
## 75 New York 2
## 76 Old Westbury 2
## 77 New York 2
## 78 Rochester 2
## 79 Saint Bonaventure 0
## 80 Brooklyn Heights 2
## 81 Albany 2
## 82 Loudonville 2
## 83 Brooklyn 2
## 84 Patchogue 2
## 85 Rochester 2
## 86 Queens 2
## 87 Albany 2
## 88 Vestal 2
## 89 Stony Brook 2
## 90 Utica 0
## 91 Geneseo 2
## 92 New Paltz 2
## 93 Oswego 2
## 94 Old Westbury 2
## 95 Syracuse 2
## 96 New York 2
## 97 Albany 2
## 98 Utica 2
## 99 Staten Island 2
## 100 New York 2
## 101 Buies Creek 2
## 102 Greenville 2
## 103 Elon 2
## 104 Boiling Springs 2
## 105 Greensboro 2
## 106 Raleigh 2
## 107 Wingate 2
## 108 Cullowhee 2
## 109 Radnor 2
## 110 Philadelphia 2
## 111 Elizabethtown 2
## 112 Philadelphia 0
## 113 La Plume 2
## 114 Wilkes-Barre 1
## 115 Philadelphia 1
## 116 Bethlehem 1
## 117 Aston 2
## 118 Philadelphia 2
## 119 Langhorne 1
## 120 Scranton 0
## 121 Philadelphia 2
## 122 Villanova 0
## 123 Chester 2
## 124 York 2
## 125 Fairfax 2
## 126 Fort Myers 2
## 127 Fort Myers 2
## 128 Trevose 1
## 129 Danville 2
## 130 Fairfax 2
## 131 Melbourne 2
## 132 New York 2
## 133 Miramar 2
## 134 Charlotte 2
## 135 Ft. Washington 1
## 136 <NA> 2
## 137 <NA> 2
## 138 <NA> 2
## 139 <NA> 2
## 140 <NA> 2
## 141 <NA> 2
## 142 <NA> 2
## 143 <NA> 2
## 144 <NA> 0
## 145 <NA> 2
## 146 <NA> 2
## 147 <NA> 2
## 148 <NA> 2
## 149 <NA> 2
## 150 <NA> 2
# Randomly select schools
schools_bad <- temp_schools %>% filter(good_data_science_program == "NO")
schools_good <- temp_schools %>% filter(good_data_science_program == "YES")
sample_schools <- schools_good[sample(nrow(schools_good), sample_size - nrow(schools_bad)), ]
for (c in 1:nrow(schools_bad)) {
row <- schools_bad[c,]
sample_schools <- add_row(sample_schools, tibble(
name = row$name,
match_technical_skills = row$match_technical_skills,
match_soft_skills = row$match_soft_skills,
school_score = row$school_score,
good_data_science_program = row$good_data_science_program
)
)
}
# now we have our samples
school_split <- initial_split(sample_schools,
prop = 3/4)
school_split
## <Analysis/Assess/Total>
## <75/25/100>
school_train <- training(school_split)
school_test <- testing(school_split)
school_cv <- vfold_cv(school_train)
school_cv
## # 10-fold cross-validation
## # A tibble: 10 × 2
## splits id
## <list> <chr>
## 1 <split [67/8]> Fold01
## 2 <split [67/8]> Fold02
## 3 <split [67/8]> Fold03
## 4 <split [67/8]> Fold04
## 5 <split [67/8]> Fold05
## 6 <split [68/7]> Fold06
## 7 <split [68/7]> Fold07
## 8 <split [68/7]> Fold08
## 9 <split [68/7]> Fold09
## 10 <split [68/7]> Fold10
# define the recipe
school_recipe <-
# which consists of the formula (outcome ~ predictors)
recipe(good_data_science_program ~ match_technical_skills + match_soft_skills + school_score,
data = sample_schools) %>%
step_normalize(all_numeric()) %>%
step_impute_knn(all_predictors())
school_recipe
## Recipe
##
## Inputs:
##
## role #variables
## outcome 1
## predictor 3
##
## Operations:
##
## Centering and scaling for all_numeric()
## K-nearest neighbor imputation for all_predictors()
school_train_preprocessed <- school_recipe %>%
# apply the recipe to the training data
prep(school_train) %>%
# extract the pre-processed training dataset
juice()
school_train_preprocessed
## # A tibble: 75 × 4
## match_technical_skills match_soft_skills school_score good_data_science_prog…
## <dbl> <dbl> <dbl> <fct>
## 1 0.367 0.412 0.444 YES
## 2 0.367 0.412 0.444 YES
## 3 0.367 -2.40 -1.22 NO
## 4 0.367 0.412 0.444 YES
## 5 0.367 0.412 0.444 YES
## 6 0.367 0.412 0.444 YES
## 7 0.367 0.412 0.444 YES
## 8 0.367 0.412 0.444 YES
## 9 0.367 0.412 0.444 YES
## 10 0.367 0.412 0.444 YES
## # … with 65 more rows
rf_model <-
# specify that the model is a random forest
rand_forest() %>%
# specify that the `mtry` parameter needs to be tuned
set_args(mtry = tune()) %>%
# select the engine/package that underlies the model
set_engine("ranger", importance = "impurity") %>%
# choose either the continuous regression or binary classification mode
set_mode("classification")
# set the workflow
rf_workflow <- workflow() %>%
# add the recipe
add_recipe(school_recipe) %>%
# add the model
add_model(rf_model)
rf_workflow
## ══ Workflow ════════════════════════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: rand_forest()
##
## ── Preprocessor ────────────────────────────────────────────────────────────────
## 2 Recipe Steps
##
## • step_normalize()
## • step_impute_knn()
##
## ── Model ───────────────────────────────────────────────────────────────────────
## Random Forest Model Specification (classification)
##
## Main Arguments:
## mtry = tune()
##
## Engine-Specific Arguments:
## importance = impurity
##
## Computational engine: ranger
rf_grid <- expand.grid(mtry = c(2,3))
rf_tune_results <- rf_workflow %>%
tune_grid(resamples = school_cv, #CV object
grid = rf_grid, # grid of values to try
metrics = metric_set(accuracy, roc_auc) # metrics we care about
)
## ! Fold05: internal: No event observations were detected in `truth` with event leve...
## ! Fold10: internal: No event observations were detected in `truth` with event leve...
rf_tune_results %>%
collect_metrics()
## # A tibble: 4 × 7
## mtry .metric .estimator mean n std_err .config
## <dbl> <chr> <chr> <dbl> <int> <dbl> <fct>
## 1 2 accuracy binary 1 10 0 Preprocessor1_Model1
## 2 2 roc_auc binary 1 8 0 Preprocessor1_Model1
## 3 3 accuracy binary 1 10 0 Preprocessor1_Model2
## 4 3 roc_auc binary 1 8 0 Preprocessor1_Model2
param_final <- rf_tune_results %>%
select_best(metric = "accuracy")
#param_final
rf_workflow <- rf_workflow %>%
finalize_workflow(param_final)
rf_fit <- rf_workflow %>%
# fit on the training set and evaluate on test set
last_fit(school_split)
#rf_fit
test_performance <- rf_fit %>% collect_metrics()
#test_performance
test_predictions <- rf_fit %>% collect_predictions()
#test_predictions
final_model <- fit(rf_workflow, sample_schools)
#final_model
# predict fictitious colleges
test_bad_college <- tibble(
name = "Test Bad college",
match_technical_skills = 0,
match_soft_skills = 1,
school_score = 1
)
test_good_college <- tibble(
name = "Test Good college",
match_technical_skills = 1,
match_soft_skills = 1,
school_score = 2
)
# Predict will output if the college has a good data science program
recommendation <- predict(final_model, new_data = test_bad_college)
print(paste0("For a college without a data science program the recommendation is ", recommendation$.pred_class))
## [1] "For a college without a data science program the recommendation is NO"
recommendation <- predict(final_model, new_data = test_good_college)
print(paste0("For a college with a data science program the recommendation is ", recommendation$.pred_class))
## [1] "For a college with a data science program the recommendation is YES"
# Dataframe of recommended schools.
temp_schools %>% filter(good_data_science_program == "YES")
## UNITID NAME
## 1 129242 Fairfield University
## 2 130253 Sacred Heart University
## 3 130943 University of Delaware
## 4 130989 Goldey-Beacom College
## 5 131113 Wilmington University
## 6 132471 Barry University
## 7 133951 Florida International University
## 8 134079 Florida Southern College
## 9 134097 Florida State University
## 10 134130 University of Florida
## 11 135726 University of Miami
## 12 136172 University of North Florida
## 13 137351 University of South Florida
## 14 137847 The University of Tampa
## 15 138293 Webber International University
## 16 138354 The University of West Florida
## 17 161253 University of Maine
## 18 161518 Saint Joseph's College of Maine
## 19 161554 University of Southern Maine
## 20 161563 Thomas College
## 21 161873 University of Baltimore
## 22 163204 University of Maryland Global Campus
## 23 163286 University of Maryland-College Park
## 24 163453 Morgan State University
## 25 164076 Towson University
## 26 164562 Assumption University
## 27 164580 Babson College
## 28 164632 Bay Path University
## 29 164739 Bentley University
## 30 164924 Boston College
## 31 165024 Bridgewater State University
## 32 165334 Clark University
## 33 165529 Curry College
## 34 166638 University of Massachusetts-Boston
## 35 167358 Northeastern University
## 36 168005 Suffolk University
## 37 168254 Western New England University
## 38 183822 Bloomfield College
## 39 183910 Caldwell University
## 40 184603 Fairleigh Dickinson University-Metropolitan Campus
## 41 184694 Fairleigh Dickinson University-Florham Campus
## 42 185129 New Jersey City University
## 43 185262 Kean University
## 44 185572 Monmouth University
## 45 185590 Montclair State University
## 46 186201 Ramapo College of New Jersey
## 47 186399 Rutgers University-Newark
## 48 186432 Saint Peter's University
## 49 186584 Seton Hall University
## 50 188429 Adelphi University
## 51 190512 CUNY Bernard M Baruch College
## 52 190549 CUNY Brooklyn College
## 53 190558 College of Staten Island CUNY
## 54 190594 CUNY Hunter College
## 55 190637 CUNY Lehman College
## 56 190664 CUNY Queens College
## 57 191241 Fordham University
## 58 191649 Hofstra University
## 59 191931 Iona College
## 60 191968 Ithaca College
## 61 192323 Le Moyne College
## 62 192448 Long Island University
## 63 192819 Marist College
## 64 193016 Mercy College
## 65 193292 Molloy College
## 66 193308 Monroe College
## 67 193353 Mount Saint Mary College
## 68 193584 Nazareth College
## 69 193900 New York University
## 70 194091 New York Institute of Technology
## 71 194310 Pace University
## 72 195003 Rochester Institute of Technology
## 73 195173 St Francis College
## 74 195234 The College of Saint Rose
## 75 195474 Siena College
## 76 195544 St. Joseph's College-New York
## 77 195562 St. Joseph's College-Long Island
## 78 195720 Saint John Fisher College
## 79 195809 St. John's University-New York
## 80 196060 SUNY at Albany
## 81 196079 Binghamton University
## 82 196097 Stony Brook University
## 83 196167 SUNY College at Geneseo
## 84 196176 State University of New York at New Paltz
## 85 196194 SUNY College at Oswego
## 86 196237 SUNY College at Old Westbury
## 87 196413 Syracuse University
## 88 196592 Touro College
## 89 196680 Excelsior College
## 90 197045 Utica College
## 91 197197 Wagner College
## 92 197708 Yeshiva University
## 93 198136 Campbell University
## 94 198464 East Carolina University
## 95 198516 Elon University
## 96 198561 Gardner-Webb University
## 97 199102 North Carolina A & T State University
## 98 199193 North Carolina State University at Raleigh
## 99 199962 Wingate University
## 100 200004 Western Carolina University
## 101 211352 Cabrini University
## 102 212054 Drexel University
## 103 212197 Elizabethtown College
## 104 213303 Keystone College
## 105 214272 Neumann University
## 106 215062 University of Pennsylvania
## 107 216339 Temple University
## 108 216852 Widener University
## 109 217059 York College of Pennsylvania
## 110 232186 George Mason University
## 111 367884 Hodges University
## 112 433660 Florida Gulf Coast University
## 113 449931 Averett University-Non-Traditional Programs
## 114 460376 Fairfax University of America
## 115 480569 Florida Institute of Technology-Online
## 116 482413 DeVry College of New York
## 117 482459 DeVry University-Florida
## 118 482565 DeVry University-North Carolina
## 119 NA Bloomsburg University of Pennsylvania
## 120 NA DeVry University-Virginia
## 121 NA Fitchberg State University
## 122 NA Nichols College
## 123 NA Pennsylvania State University
## 124 NA Saint Joseph's University\n
## 125 NA Strayer University - Delaware
## 126 NA Strayer University-North Carolina (online, for-profit)
## 127 NA University of Massachusetts-Lowell
## 128 NA University of Massachussetts - Amherst
## 129 NA University of Massachussetts - Dartmouth
## 130 NA University of North Carolina Chapel Hill
## 131 NA University of Vermont
## 132 NA Westfield State University
## STREET CITY STATE
## 1 1073 N Benson Rd Fairfield CT
## 2 5151 Park Ave Fairfield CT
## 3 104 Hullihen Hall Newark DE
## 4 4701 Limestone Rd Wilmington DE
## 5 320 Dupont Hwy New Castle DE
## 6 11300 NE 2nd Ave Miami FL
## 7 11200 S. W. 8 Street Miami FL
## 8 111 Lake Hollingsworth Dr Lakeland FL
## 9 222 S. Copeland Street Tallahassee FL
## 10 Tigert Hall Gainesville FL
## 11 University of Miami Coral Gables FL
## 12 1 UNF Drive Jacksonville FL
## 13 4202 East Fowler Ave Tampa FL
## 14 401 W Kennedy Blvd Tampa FL
## 15 1201 N Scenic Hwy Babson Park FL
## 16 11000 University Parkway Pensacola FL
## 17 168 College Avenue Orono ME
## 18 278 Whites Bridge Rd Standish ME
## 19 96 Falmouth St Portland ME
## 20 180 W River Rd Waterville ME
## 21 Charles at Mount Royal Baltimore MD
## 22 3501 University Blvd East Adelphi MD
## 23 M College Park MD
## 24 1700 East Cold Spring Lane Baltimore MD
## 25 8000 York Rd Towson MD
## 26 500 Salisbury St Worcester MA
## 27 231 Forest Street Wellesley MA
## 28 588 Longmeadow Street Longmeadow MA
## 29 175 Forest St Waltham MA
## 30 140 Commonwealth Avenue Chestnut Hill MA
## 31 131 Summer Street Bridgewater MA
## 32 950 Main St Worcester MA
## 33 1071 Blue Hill Ave Milton MA
## 34 100 Morrissey Boulevard Boston MA
## 35 360 Huntington Ave Boston MA
## 36 73 Tremont St. Boston MA
## 37 1215 Wilbraham Rd Springfield MA
## 38 467 Franklin St Bloomfield NJ
## 39 120 Bloomfield Avenue Caldwell NJ
## 40 1000 River Rd Teaneck NJ
## 41 285 Madison Ave Madison NJ
## 42 2039 Kennedy Blvd Jersey City NJ
## 43 1000 Morris Avenue Union NJ
## 44 400 Cedar Ave West Long Branch NJ
## 45 1 Normal Avenue Montclair NJ
## 46 505 Ramapo Valley Rd Mahwah NJ
## 47 249 University Avenue, Blumenthal Hall Newark NJ
## 48 2641 Kennedy Blvd Jersey City NJ
## 49 400 S Orange Ave South Orange NJ
## 50 South Ave Garden City NY
## 51 One Bernard Baruch Way (55 Lexington Ave at 24th St) New York NY
## 52 2900 Bedford Ave Brooklyn NY
## 53 2800 Victory Blvd Staten Island NY
## 54 695 Park Ave New York NY
## 55 250 Bedford Park Blvd West Bronx NY
## 56 65-30 Kissena Blvd Queens NY
## 57 441 E Fordham Rd Bronx NY
## 58 100 Hofstra University Hempstead NY
## 59 715 North Ave New Rochelle NY
## 60 953 Danby Road Ithaca NY
## 61 1419 Salt Springs Rd Syracuse NY
## 62 720 Northern Blvd Brookville NY
## 63 3399 North Rd Poughkeepsie NY
## 64 555 Broadway Dobbs Ferry NY
## 65 1000 Hempstead Ave Rockville Centre NY
## 66 2501 Jerome Avenue Bronx NY
## 67 330 Powell Avenue Newburgh NY
## 68 4245 East Ave Rochester NY
## 69 70 Washington Sq South New York NY
## 70 Northern Blvd Old Westbury NY
## 71 1 Pace Plaza New York NY
## 72 1 Lomb Memorial Dr Rochester NY
## 73 180 Remsen Street Brooklyn Heights NY
## 74 432 Western Ave Albany NY
## 75 515 Loudon Rd Loudonville NY
## 76 245 Clinton Ave Brooklyn NY
## 77 155 W Roe Blvd Patchogue NY
## 78 3690 East Ave Rochester NY
## 79 8000 Utopia Pky Queens NY
## 80 1400 Washington Avenue Albany NY
## 81 4400 Vestal Parkway East Vestal NY
## 82 310 Administration Building Stony Brook NY
## 83 1 College Circle Geneseo NY
## 84 1 Hawk Drive New Paltz NY
## 85 7060 State Route 104 Oswego NY
## 86 223 Store Hill Rd Old Westbury NY
## 87 900 South Crouse Ave. Syracuse NY
## 88 500 7th Avenue New York NY
## 89 7 Columbia Cir Albany NY
## 90 1600 Burrstone Rd Utica NY
## 91 One Campus Rd Staten Island NY
## 92 500 W 185th St New York NY
## 93 143 Main Street Buies Creek NC
## 94 East 5th Street Greenville NC
## 95 100 Campus Drive Elon NC
## 96 Main St Boiling Springs NC
## 97 1601 E Market St Greensboro NC
## 98 2101 Hillsborough Street Raleigh NC
## 99 301 E. Wilson Street Wingate NC
## 100 Highway 107 Cullowhee NC
## 101 610 King of Prussia Rd Radnor PA
## 102 3141 Chestnut St Philadelphia PA
## 103 One Alpha Drive Elizabethtown PA
## 104 One College Green La Plume PA
## 105 One Neumann Drive Aston PA
## 106 34th & Spruce Street Philadelphia PA
## 107 1801 North Broad Street Philadelphia PA
## 108 One University Place Chester PA
## 109 441 Country Club Rd York PA
## 110 4400 University Dr Fairfax VA
## 111 4501 Colonial Blvd Fort Myers FL
## 112 10501 Fgcu Blvd S Fort Myers FL
## 113 420 W Main St Danville VA
## 114 4401 Village Drive Fairfax VA
## 115 150 West University Blvd Melbourne FL
## 116 180 Madison Ave., Ste. 1200 New York NY
## 117 2300 SW 145th Ave. Miramar FL
## 118 2015 Ayrsley Town Blvd., Ste. 109 Charlotte NC
## 119 <NA> <NA> <NA>
## 120 <NA> <NA> <NA>
## 121 <NA> <NA> <NA>
## 122 <NA> <NA> <NA>
## 123 <NA> <NA> <NA>
## 124 <NA> <NA> <NA>
## 125 <NA> <NA> <NA>
## 126 <NA> <NA> <NA>
## 127 <NA> <NA> <NA>
## 128 <NA> <NA> <NA>
## 129 <NA> <NA> <NA>
## 130 <NA> <NA> <NA>
## 131 <NA> <NA> <NA>
## 132 <NA> <NA> <NA>
## ZIP STFIP CNTY NMCNTY LOCALE LAT LON
## 1 06824-5195 09 09001 Fairfield County 21 41.15767 -73.25590
## 2 06825-1000 09 09001 Fairfield County 21 41.22089 -73.24333
## 3 19716 10 10003 New Castle County 21 39.67958 -75.75282
## 4 19808 10 10003 New Castle County 21 39.74150 -75.68962
## 5 19720 10 10003 New Castle County 21 39.68230 -75.58700
## 6 33161-6695 12 12086 Miami-Dade County 21 25.87891 -80.19893
## 7 33199 12 12086 Miami-Dade County 21 25.75732 -80.37393
## 8 33801-5698 12 12105 Polk County 12 28.03244 -81.94820
## 9 32306-1037 12 12073 Leon County 12 30.44076 -84.29192
## 10 32611 12 12001 Alachua County 12 29.64629 -82.34791
## 11 33146 12 12086 Miami-Dade County 13 25.72126 -80.27866
## 12 32224-7699 12 12031 Duval County 11 30.27194 -81.50914
## 13 33620-9951 12 12057 Hillsborough County 11 28.06146 -82.41323
## 14 33606-1490 12 12057 Hillsborough County 11 27.94845 -82.46483
## 15 33827-0096 12 12105 Polk County 31 27.83878 -81.53231
## 16 32514-5750 12 12033 Escambia County 13 30.54908 -87.21851
## 17 04469 23 23019 Penobscot County 23 44.89926 -68.66933
## 18 04084-5236 23 23005 Cumberland County 41 43.82631 -70.48337
## 19 04103 23 23005 Cumberland County 13 43.66286 -70.27425
## 20 04901-5097 23 23011 Kennebec County 41 44.52491 -69.66473
## 21 21201-5720 24 24510 Baltimore city 11 39.30583 -76.61659
## 22 20783-8010 24 24033 Prince George's County 21 38.91271 -76.84758
## 23 20742 24 24033 Prince George's County 21 38.98818 -76.94472
## 24 21251-0001 24 24510 Baltimore city 11 39.34416 -76.58557
## 25 21252-0001 24 24005 Baltimore County 13 39.39362 -76.61116
## 26 01609-1296 25 25027 Worcester County 12 42.29423 -71.82899
## 27 02457-0310 25 25021 Norfolk County 21 42.29702 -71.26406
## 28 01106 25 25013 Hampden County 21 42.05509 -72.58338
## 29 02452-4705 25 25017 Middlesex County 13 42.38600 -71.22284
## 30 02467 25 25017 Middlesex County 13 42.33621 -71.16924
## 31 02325 25 25023 Plymouth County 21 41.98749 -70.97455
## 32 01610-1477 25 25027 Worcester County 12 42.24999 -71.82336
## 33 02186-2395 25 25021 Norfolk County 21 42.23806 -71.11654
## 34 02125-3393 25 25025 Suffolk County 11 42.31288 -71.03687
## 35 02115-5005 25 25025 Suffolk County 11 42.33999 -71.08878
## 36 02108-3901 25 25025 Suffolk County 11 42.35795 -71.06092
## 37 01119-2684 25 25013 Hampden County 12 42.11502 -72.52047
## 38 07003 34 34013 Essex County 21 40.79510 -74.19431
## 39 07006-6195 34 34013 Essex County 21 40.83275 -74.27257
## 40 07666 34 34003 Bergen County 21 40.89721 -74.02899
## 41 07940 34 34027 Morris County 21 40.77450 -74.43212
## 42 07305 34 34017 Hudson County 11 40.70994 -74.08727
## 43 07083 34 34039 Union County 21 40.67798 -74.23350
## 44 07764-1898 34 34025 Monmouth County 21 40.28007 -74.00645
## 45 07043-1624 34 34031 Passaic County 21 40.86041 -74.19814
## 46 07430-1680 34 34003 Bergen County 21 41.08094 -74.17409
## 47 07102 34 34013 Essex County 11 40.73912 -74.17581
## 48 07306-5997 34 34017 Hudson County 11 40.72711 -74.07154
## 49 07079-2697 34 34013 Essex County 21 40.74234 -74.24603
## 50 11530-0701 36 36059 Nassau County 21 40.72144 -73.65332
## 51 10010 36 36061 New York County 11 40.74024 -73.98342
## 52 11210 36 36047 Kings County 11 40.63152 -73.94990
## 53 10314 36 36085 Richmond County 11 40.60183 -74.14849
## 54 10065 36 36061 New York County 11 40.76867 -73.96479
## 55 10468 36 36005 Bronx County 11 40.87296 -73.89538
## 56 11367 36 36081 Queens County 11 40.73518 -73.81610
## 57 10458 36 36005 Bronx County 11 40.85935 -73.88271
## 58 11549 36 36059 Nassau County 21 40.71596 -73.60078
## 59 10801-1890 36 36119 Westchester County 21 40.92572 -73.78805
## 60 14850-7002 36 36109 Tompkins County 23 42.42215 -76.49414
## 61 13214-1301 36 36067 Onondaga County 21 43.04919 -76.09043
## 62 11548-1327 36 36059 Nassau County 21 40.82071 -73.59368
## 63 12601 36 36027 Dutchess County 21 41.72094 -73.93548
## 64 10522 36 36119 Westchester County 21 41.02163 -73.87445
## 65 11571-5002 36 36059 Nassau County 21 40.68594 -73.62618
## 66 10468 36 36005 Bronx County 11 40.86446 -73.90022
## 67 12550 36 36071 Orange County 13 41.51387 -74.01265
## 68 14618-3790 36 36055 Monroe County 21 43.10158 -77.51858
## 69 10012-1091 36 36061 New York County 11 40.72945 -73.99726
## 70 11568-8000 36 36059 Nassau County 21 40.81245 -73.60780
## 71 10038-1598 36 36061 New York County 11 40.71101 -74.00472
## 72 14623-5603 36 36055 Monroe County 21 43.08419 -77.67386
## 73 11201-4305 36 36047 Kings County 11 40.69323 -73.99216
## 74 12203-1490 36 36001 Albany County 13 42.66430 -73.78666
## 75 12211-1462 36 36001 Albany County 21 42.71760 -73.75260
## 76 11205-3688 36 36047 Kings County 11 40.69042 -73.96766
## 77 11772 36 36103 Suffolk County 21 40.77593 -73.02466
## 78 14618-3597 36 36055 Monroe County 21 43.11626 -77.51306
## 79 11439 36 36081 Queens County 11 40.72252 -73.79610
## 80 12222 36 36001 Albany County 13 42.68549 -73.82466
## 81 13850-6000 36 36007 Broome County 22 42.08787 -75.96689
## 82 11794-0701 36 36103 Suffolk County 21 40.91476 -73.12046
## 83 14454-1465 36 36051 Livingston County 32 42.79664 -77.82189
## 84 12561-2443 36 36111 Ulster County 21 41.74094 -74.08219
## 85 13126 36 36075 Oswego County 32 43.45429 -76.54080
## 86 11568-0210 36 36059 Nassau County 21 40.79902 -73.57191
## 87 13244 36 36067 Onondaga County 12 43.04018 -76.13698
## 88 10018 36 36061 New York County 11 40.75320 -73.98940
## 89 12203-5159 36 36001 Albany County 13 42.70549 -73.86298
## 90 13502-4892 36 36065 Oneida County 13 43.09621 -75.27292
## 91 10301-4495 36 36085 Richmond County 11 40.61559 -74.09291
## 92 10033-3299 36 36061 New York County 11 40.85061 -73.92987
## 93 27506 37 37085 Harnett County 31 35.40915 -78.73824
## 94 27858-4353 37 37147 Pitt County 13 35.60719 -77.36829
## 95 27244-2010 37 37001 Alamance County 22 36.10415 -79.50344
## 96 28017-0997 37 37045 Cleveland County 32 35.24732 -81.66814
## 97 27411 37 37081 Guilford County 11 36.07282 -79.77338
## 98 27695-7001 37 37183 Wake County 11 35.78511 -78.67452
## 99 28174-0159 37 37179 Union County 21 34.98606 -80.44305
## 100 28723-9646 37 37099 Jackson County 32 35.30898 -83.18626
## 101 19087-3698 42 42045 Delaware County 21 40.05636 -75.37526
## 102 19104 42 42101 Philadelphia County 11 39.95522 -75.19005
## 103 17022-2298 42 42071 Lancaster County 21 40.14924 -76.59322
## 104 18440-0200 42 42131 Wyoming County 21 41.55897 -75.77746
## 105 19014-1298 42 42045 Delaware County 21 39.87488 -75.44002
## 106 19104-6303 42 42101 Philadelphia County 11 39.95093 -75.19391
## 107 19122-6096 42 42101 Philadelphia County 11 39.98055 -75.15686
## 108 19013-5792 42 42045 Delaware County 21 39.86169 -75.35536
## 109 17403-3651 42 42133 York County 22 39.94614 -76.72798
## 110 22030-4444 51 51059 Fairfax County 21 38.82998 -77.30743
## 111 33966 12 12071 Lee County 13 26.61087 -81.82142
## 112 33965-6565 12 12071 Lee County 21 26.46364 -81.77260
## 113 24541 51 51590 Danville city 32 36.57729 -79.41320
## 114 22030-0000 51 51059 Fairfax County 21 38.84905 -77.34775
## 115 32901-6975 12 12009 Brevard County 13 28.06575 -80.62438
## 116 10016 36 36061 New York County 11 40.74775 -73.98349
## 117 33027 12 12011 Broward County 21 25.98764 -80.33984
## 118 28273 37 37119 Mecklenburg County 11 35.13766 -80.93197
## 119 <NA> <NA> <NA> <NA> <NA> NA NA
## 120 <NA> <NA> <NA> <NA> <NA> NA NA
## 121 <NA> <NA> <NA> <NA> <NA> NA NA
## 122 <NA> <NA> <NA> <NA> <NA> NA NA
## 123 <NA> <NA> <NA> <NA> <NA> NA NA
## 124 <NA> <NA> <NA> <NA> <NA> NA NA
## 125 <NA> <NA> <NA> <NA> <NA> NA NA
## 126 <NA> <NA> <NA> <NA> <NA> NA NA
## 127 <NA> <NA> <NA> <NA> <NA> NA NA
## 128 <NA> <NA> <NA> <NA> <NA> NA NA
## 129 <NA> <NA> <NA> <NA> <NA> NA NA
## 130 <NA> <NA> <NA> <NA> <NA> NA NA
## 131 <NA> <NA> <NA> <NA> <NA> NA NA
## 132 <NA> <NA> <NA> <NA> <NA> NA NA
## CBSA NMCBSA CBSATYPE CSA
## 1 14860 Bridgeport-Stamford-Norwalk, CT 1 408
## 2 14860 Bridgeport-Stamford-Norwalk, CT 1 408
## 3 37980 Philadelphia-Camden-Wilmington, PA-NJ-DE-MD 1 428
## 4 37980 Philadelphia-Camden-Wilmington, PA-NJ-DE-MD 1 428
## 5 37980 Philadelphia-Camden-Wilmington, PA-NJ-DE-MD 1 428
## 6 33100 Miami-Fort Lauderdale-Pompano Beach, FL 1 370
## 7 33100 Miami-Fort Lauderdale-Pompano Beach, FL 1 370
## 8 29460 Lakeland-Winter Haven, FL 1 422
## 9 45220 Tallahassee, FL 1 N
## 10 23540 Gainesville, FL 1 264
## 11 33100 Miami-Fort Lauderdale-Pompano Beach, FL 1 370
## 12 27260 Jacksonville, FL 1 300
## 13 45300 Tampa-St. Petersburg-Clearwater, FL 1 N
## 14 45300 Tampa-St. Petersburg-Clearwater, FL 1 N
## 15 29460 Lakeland-Winter Haven, FL 1 422
## 16 37860 Pensacola-Ferry Pass-Brent, FL 1 426
## 17 12620 Bangor, ME 1 N
## 18 38860 Portland-South Portland, ME 1 438
## 19 38860 Portland-South Portland, ME 1 438
## 20 12300 Augusta-Waterville, ME 2 N
## 21 12580 Baltimore-Columbia-Towson, MD 1 548
## 22 47900 Washington-Arlington-Alexandria, DC-VA-MD-WV 1 548
## 23 47900 Washington-Arlington-Alexandria, DC-VA-MD-WV 1 548
## 24 12580 Baltimore-Columbia-Towson, MD 1 548
## 25 12580 Baltimore-Columbia-Towson, MD 1 548
## 26 49340 Worcester, MA-CT 1 148
## 27 14460 Boston-Cambridge-Newton, MA-NH 1 148
## 28 44140 Springfield, MA 1 N
## 29 14460 Boston-Cambridge-Newton, MA-NH 1 148
## 30 14460 Boston-Cambridge-Newton, MA-NH 1 148
## 31 14460 Boston-Cambridge-Newton, MA-NH 1 148
## 32 49340 Worcester, MA-CT 1 148
## 33 14460 Boston-Cambridge-Newton, MA-NH 1 148
## 34 14460 Boston-Cambridge-Newton, MA-NH 1 148
## 35 14460 Boston-Cambridge-Newton, MA-NH 1 148
## 36 14460 Boston-Cambridge-Newton, MA-NH 1 148
## 37 44140 Springfield, MA 1 N
## 38 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 39 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 40 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 41 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 42 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 43 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 44 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 45 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 46 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 47 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 48 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 49 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 50 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 51 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 52 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 53 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 54 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 55 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 56 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 57 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 58 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 59 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 60 27060 Ithaca, NY 1 296
## 61 45060 Syracuse, NY 1 532
## 62 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 63 39100 Poughkeepsie-Newburgh-Middletown, NY 1 408
## 64 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 65 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 66 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 67 39100 Poughkeepsie-Newburgh-Middletown, NY 1 408
## 68 40380 Rochester, NY 1 464
## 69 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 70 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 71 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 72 40380 Rochester, NY 1 464
## 73 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 74 10580 Albany-Schenectady-Troy, NY 1 104
## 75 10580 Albany-Schenectady-Troy, NY 1 104
## 76 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 77 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 78 40380 Rochester, NY 1 464
## 79 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 80 10580 Albany-Schenectady-Troy, NY 1 104
## 81 13780 Binghamton, NY 1 N
## 82 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 83 40380 Rochester, NY 1 464
## 84 28740 Kingston, NY 1 408
## 85 45060 Syracuse, NY 1 532
## 86 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 87 45060 Syracuse, NY 1 532
## 88 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 89 10580 Albany-Schenectady-Troy, NY 1 104
## 90 46540 Utica-Rome, NY 1 N
## 91 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 92 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 93 22180 Fayetteville, NC 1 246
## 94 24780 Greenville, NC 1 272
## 95 15500 Burlington, NC 1 268
## 96 43140 Shelby, NC 2 172
## 97 24660 Greensboro-High Point, NC 1 268
## 98 39580 Raleigh-Cary, NC 1 450
## 99 16740 Charlotte-Concord-Gastonia, NC-SC 1 172
## 100 19000 Cullowhee, NC 2 N
## 101 37980 Philadelphia-Camden-Wilmington, PA-NJ-DE-MD 1 428
## 102 37980 Philadelphia-Camden-Wilmington, PA-NJ-DE-MD 1 428
## 103 29540 Lancaster, PA 1 N
## 104 42540 Scranton--Wilkes-Barre, PA 1 N
## 105 37980 Philadelphia-Camden-Wilmington, PA-NJ-DE-MD 1 428
## 106 37980 Philadelphia-Camden-Wilmington, PA-NJ-DE-MD 1 428
## 107 37980 Philadelphia-Camden-Wilmington, PA-NJ-DE-MD 1 428
## 108 37980 Philadelphia-Camden-Wilmington, PA-NJ-DE-MD 1 428
## 109 49620 York-Hanover, PA 1 276
## 110 47900 Washington-Arlington-Alexandria, DC-VA-MD-WV 1 548
## 111 15980 Cape Coral-Fort Myers, FL 1 163
## 112 15980 Cape Coral-Fort Myers, FL 1 163
## 113 19260 Danville, VA 2 N
## 114 47900 Washington-Arlington-Alexandria, DC-VA-MD-WV 1 548
## 115 37340 Palm Bay-Melbourne-Titusville, FL 1 N
## 116 35620 New York-Newark-Jersey City, NY-NJ-PA 1 408
## 117 33100 Miami-Fort Lauderdale-Pompano Beach, FL 1 370
## 118 16740 Charlotte-Concord-Gastonia, NC-SC 1 172
## 119 <NA> <NA> NA <NA>
## 120 <NA> <NA> NA <NA>
## 121 <NA> <NA> NA <NA>
## 122 <NA> <NA> NA <NA>
## 123 <NA> <NA> NA <NA>
## 124 <NA> <NA> NA <NA>
## 125 <NA> <NA> NA <NA>
## 126 <NA> <NA> NA <NA>
## 127 <NA> <NA> NA <NA>
## 128 <NA> <NA> NA <NA>
## 129 <NA> <NA> NA <NA>
## 130 <NA> <NA> NA <NA>
## 131 <NA> <NA> NA <NA>
## 132 <NA> <NA> NA <NA>
## NMCSA NECTA
## 1 New York-Newark, NY-NJ-CT-PA 71950
## 2 New York-Newark, NY-NJ-CT-PA 71950
## 3 Philadelphia-Reading-Camden, PA-NJ-DE-MD N
## 4 Philadelphia-Reading-Camden, PA-NJ-DE-MD N
## 5 Philadelphia-Reading-Camden, PA-NJ-DE-MD N
## 6 Miami-Port St. Lucie-Fort Lauderdale, FL N
## 7 Miami-Port St. Lucie-Fort Lauderdale, FL N
## 8 Orlando-Lakeland-Deltona, FL N
## 9 N N
## 10 Gainesville-Lake City, FL N
## 11 Miami-Port St. Lucie-Fort Lauderdale, FL N
## 12 Jacksonville-St. Marys-Palatka, FL-GA N
## 13 N N
## 14 N N
## 15 Orlando-Lakeland-Deltona, FL N
## 16 Pensacola-Ferry Pass, FL-AL N
## 17 N 70750
## 18 Portland-Lewiston-South Portland, ME 76750
## 19 Portland-Lewiston-South Portland, ME 76750
## 20 N 78850
## 21 Washington-Baltimore-Arlington, DC-MD-VA-WV-PA N
## 22 Washington-Baltimore-Arlington, DC-MD-VA-WV-PA N
## 23 Washington-Baltimore-Arlington, DC-MD-VA-WV-PA N
## 24 Washington-Baltimore-Arlington, DC-MD-VA-WV-PA N
## 25 Washington-Baltimore-Arlington, DC-MD-VA-WV-PA N
## 26 Boston-Worcester-Providence, MA-RI-NH-CT 79600
## 27 Boston-Worcester-Providence, MA-RI-NH-CT 71650
## 28 N 78100
## 29 Boston-Worcester-Providence, MA-RI-NH-CT 71650
## 30 Boston-Worcester-Providence, MA-RI-NH-CT 71650
## 31 Boston-Worcester-Providence, MA-RI-NH-CT 71650
## 32 Boston-Worcester-Providence, MA-RI-NH-CT 79600
## 33 Boston-Worcester-Providence, MA-RI-NH-CT 71650
## 34 Boston-Worcester-Providence, MA-RI-NH-CT 71650
## 35 Boston-Worcester-Providence, MA-RI-NH-CT 71650
## 36 Boston-Worcester-Providence, MA-RI-NH-CT 71650
## 37 N 78100
## 38 New York-Newark, NY-NJ-CT-PA N
## 39 New York-Newark, NY-NJ-CT-PA N
## 40 New York-Newark, NY-NJ-CT-PA N
## 41 New York-Newark, NY-NJ-CT-PA N
## 42 New York-Newark, NY-NJ-CT-PA N
## 43 New York-Newark, NY-NJ-CT-PA N
## 44 New York-Newark, NY-NJ-CT-PA N
## 45 New York-Newark, NY-NJ-CT-PA N
## 46 New York-Newark, NY-NJ-CT-PA N
## 47 New York-Newark, NY-NJ-CT-PA N
## 48 New York-Newark, NY-NJ-CT-PA N
## 49 New York-Newark, NY-NJ-CT-PA N
## 50 New York-Newark, NY-NJ-CT-PA N
## 51 New York-Newark, NY-NJ-CT-PA N
## 52 New York-Newark, NY-NJ-CT-PA N
## 53 New York-Newark, NY-NJ-CT-PA N
## 54 New York-Newark, NY-NJ-CT-PA N
## 55 New York-Newark, NY-NJ-CT-PA N
## 56 New York-Newark, NY-NJ-CT-PA N
## 57 New York-Newark, NY-NJ-CT-PA N
## 58 New York-Newark, NY-NJ-CT-PA N
## 59 New York-Newark, NY-NJ-CT-PA N
## 60 Ithaca-Cortland, NY N
## 61 Syracuse-Auburn, NY N
## 62 New York-Newark, NY-NJ-CT-PA N
## 63 New York-Newark, NY-NJ-CT-PA N
## 64 New York-Newark, NY-NJ-CT-PA N
## 65 New York-Newark, NY-NJ-CT-PA N
## 66 New York-Newark, NY-NJ-CT-PA N
## 67 New York-Newark, NY-NJ-CT-PA N
## 68 Rochester-Batavia-Seneca Falls, NY N
## 69 New York-Newark, NY-NJ-CT-PA N
## 70 New York-Newark, NY-NJ-CT-PA N
## 71 New York-Newark, NY-NJ-CT-PA N
## 72 Rochester-Batavia-Seneca Falls, NY N
## 73 New York-Newark, NY-NJ-CT-PA N
## 74 Albany-Schenectady, NY N
## 75 Albany-Schenectady, NY N
## 76 New York-Newark, NY-NJ-CT-PA N
## 77 New York-Newark, NY-NJ-CT-PA N
## 78 Rochester-Batavia-Seneca Falls, NY N
## 79 New York-Newark, NY-NJ-CT-PA N
## 80 Albany-Schenectady, NY N
## 81 N N
## 82 New York-Newark, NY-NJ-CT-PA N
## 83 Rochester-Batavia-Seneca Falls, NY N
## 84 New York-Newark, NY-NJ-CT-PA N
## 85 Syracuse-Auburn, NY N
## 86 New York-Newark, NY-NJ-CT-PA N
## 87 Syracuse-Auburn, NY N
## 88 New York-Newark, NY-NJ-CT-PA N
## 89 Albany-Schenectady, NY N
## 90 N N
## 91 New York-Newark, NY-NJ-CT-PA N
## 92 New York-Newark, NY-NJ-CT-PA N
## 93 Fayetteville-Sanford-Lumberton, NC N
## 94 Greenville-Kinston-Washington, NC N
## 95 Greensboro--Winston-Salem--High Point, NC N
## 96 Charlotte-Concord, NC-SC N
## 97 Greensboro--Winston-Salem--High Point, NC N
## 98 Raleigh-Durham-Cary, NC N
## 99 Charlotte-Concord, NC-SC N
## 100 N N
## 101 Philadelphia-Reading-Camden, PA-NJ-DE-MD N
## 102 Philadelphia-Reading-Camden, PA-NJ-DE-MD N
## 103 N N
## 104 N N
## 105 Philadelphia-Reading-Camden, PA-NJ-DE-MD N
## 106 Philadelphia-Reading-Camden, PA-NJ-DE-MD N
## 107 Philadelphia-Reading-Camden, PA-NJ-DE-MD N
## 108 Philadelphia-Reading-Camden, PA-NJ-DE-MD N
## 109 Harrisburg-York-Lebanon, PA N
## 110 Washington-Baltimore-Arlington, DC-MD-VA-WV-PA N
## 111 Cape Coral-Fort Myers-Naples, FL N
## 112 Cape Coral-Fort Myers-Naples, FL N
## 113 N N
## 114 Washington-Baltimore-Arlington, DC-MD-VA-WV-PA N
## 115 N N
## 116 New York-Newark, NY-NJ-CT-PA N
## 117 Miami-Port St. Lucie-Fort Lauderdale, FL N
## 118 Charlotte-Concord, NC-SC N
## 119 <NA> <NA>
## 120 <NA> <NA>
## 121 <NA> <NA>
## 122 <NA> <NA>
## 123 <NA> <NA>
## 124 <NA> <NA>
## 125 <NA> <NA>
## 126 <NA> <NA>
## 127 <NA> <NA>
## 128 <NA> <NA>
## 129 <NA> <NA>
## 130 <NA> <NA>
## 131 <NA> <NA>
## 132 <NA> <NA>
## NMNECTA CD SLDL SLDU SCHOOLYEAR
## 1 Bridgeport-Stamford-Norwalk, CT 0904 09133 09028 2020-2021
## 2 Bridgeport-Stamford-Norwalk, CT 0904 09134 09028 2020-2021
## 3 N 1000 10025 10008 2020-2021
## 4 N 1000 10021 10004 2020-2021
## 5 N 1000 10017 10012 2020-2021
## 6 N 1224 12108 12038 2020-2021
## 7 N 1226 12116 12039 2020-2021
## 8 N 1215 12040 12022 2020-2021
## 9 N 1202 12009 12003 2020-2021
## 10 N 1203 12021 12008 2020-2021
## 11 N 1227 12114 12037 2020-2021
## 12 N 1204 12012 12004 2020-2021
## 13 N 1214 12063 12020 2020-2021
## 14 N 1214 12060 12018 2020-2021
## 15 N 1209 12042 12026 2020-2021
## 16 N 1201 12001 12001 2020-2021
## 17 Bangor, ME 2302 23123 23005 2020-2021
## 18 Portland-South Portland, ME 2301 23023 23026 2020-2021
## 19 Portland-South Portland, ME 2301 23040 23027 2020-2021
## 20 Waterville, ME 2301 23109 23016 2020-2021
## 21 N 2407 24045 24045 2020-2021
## 22 N 2404 24024 24024 2020-2021
## 23 N 2405 24021 24021 2020-2021
## 24 N 2407 24043 24043 2020-2021
## 25 N 2402 2442A 24042 2020-2021
## 26 Worcester, MA-CT 2502 25215 25010 2020-2021
## 27 Boston-Cambridge-Newton, MA-NH 2504 25170 25017 2020-2021
## 28 Springfield, MA-CT 2501 25103 25007 2020-2021
## 29 Boston-Cambridge-Newton, MA-NH 2505 25127 25016 2020-2021
## 30 Boston-Cambridge-Newton, MA-NH 2504 25129 25029 2020-2021
## 31 Boston-Cambridge-Newton, MA-NH 2508 25179 25036 2020-2021
## 32 Worcester, MA-CT 2502 25219 25010 2020-2021
## 33 Boston-Cambridge-Newton, MA-NH 2507 25163 25032 2020-2021
## 34 Boston-Cambridge-Newton, MA-NH 2508 25187 25001 2020-2021
## 35 Boston-Cambridge-Newton, MA-NH 2507 25190 25028 2020-2021
## 36 Boston-Cambridge-Newton, MA-NH 2508 25186 25027 2020-2021
## 37 Springfield, MA-CT 2501 25110 25007 2020-2021
## 38 N 3410 34028 34028 2020-2021
## 39 N 3411 34027 34027 2020-2021
## 40 N 3405 34037 34037 2020-2021
## 41 N 3411 34027 34027 2020-2021
## 42 N 3410 34031 34031 2020-2021
## 43 N 3410 34020 34020 2020-2021
## 44 N 3406 34011 34011 2020-2021
## 45 N 3411 34040 34040 2020-2021
## 46 N 3405 34039 34039 2020-2021
## 47 N 3410 34029 34029 2020-2021
## 48 N 3410 34033 34033 2020-2021
## 49 N 3410 34027 34027 2020-2021
## 50 N 3604 36019 36006 2020-2021
## 51 N 3612 36075 36028 2020-2021
## 52 N 3609 36042 36021 2020-2021
## 53 N 3611 36063 36024 2020-2021
## 54 N 3612 36073 36028 2020-2021
## 55 N 3613 36081 36034 2020-2021
## 56 N 3606 36025 36016 2020-2021
## 57 N 3615 36078 36034 2020-2021
## 58 N 3604 36019 36006 2020-2021
## 59 N 3616 36088 36037 2020-2021
## 60 N 3623 36125 36058 2020-2021
## 61 N 3624 36128 36050 2020-2021
## 62 N 3603 36019 36005 2020-2021
## 63 N 3618 36106 36041 2020-2021
## 64 N 3617 36092 36035 2020-2021
## 65 N 3604 36021 36009 2020-2021
## 66 N 3613 36078 36033 2020-2021
## 67 N 3618 36104 36039 2020-2021
## 68 N 3625 36133 36055 2020-2021
## 69 N 3610 36066 36027 2020-2021
## 70 N 3603 36019 36005 2020-2021
## 71 N 3610 36066 36026 2020-2021
## 72 N 3625 36138 36059 2020-2021
## 73 N 3607 36052 36026 2020-2021
## 74 N 3620 36109 36044 2020-2021
## 75 N 3620 36110 36044 2020-2021
## 76 N 3608 36057 36025 2020-2021
## 77 N 3601 36007 36003 2020-2021
## 78 N 3625 36133 36055 2020-2021
## 79 N 3605 36024 36014 2020-2021
## 80 N 3620 36109 36044 2020-2021
## 81 N 3622 36123 36052 2020-2021
## 82 N 3601 36004 36002 2020-2021
## 83 N 3627 36133 36059 2020-2021
## 84 N 3619 36103 36042 2020-2021
## 85 N 3624 36130 36048 2020-2021
## 86 N 3603 36015 36005 2020-2021
## 87 N 3624 36129 36053 2020-2021
## 88 N 3610 36075 36031 2020-2021
## 89 N 3620 36109 36044 2020-2021
## 90 N 3622 36119 36047 2020-2021
## 91 N 3611 36063 36023 2020-2021
## 92 N 3613 36072 36031 2020-2021
## 93 N 3702 37053 37012 2020-2021
## 94 N 3701 37008 37005 2020-2021
## 95 N 3706 37064 37024 2020-2021
## 96 N 3710 37111 37044 2020-2021
## 97 N 3706 37061 37028 2020-2021
## 98 N 3704 37033 37015 2020-2021
## 99 N 3709 37069 37035 2020-2021
## 100 N 3711 37119 37050 2020-2021
## 101 N 4205 42165 42017 2020-2021
## 102 N 4203 42188 42007 2020-2021
## 103 N 4211 42098 42036 2020-2021
## 104 N 4212 42117 42020 2020-2021
## 105 N 4205 42161 42009 2020-2021
## 106 N 4203 42188 42008 2020-2021
## 107 N 4202 42181 42003 2020-2021
## 108 N 4205 42159 42009 2020-2021
## 109 N 4210 42095 42028 2020-2021
## 110 N 5111 51037 51034 2020-2021
## 111 N 1219 12078 12027 2020-2021
## 112 N 1219 12078 12027 2020-2021
## 113 N 5105 51014 51020 2020-2021
## 114 N 5111 51037 51037 2020-2021
## 115 N 1208 12052 12017 2020-2021
## 116 N 3612 36075 36027 2020-2021
## 117 N 1220 12103 12035 2020-2021
## 118 N 3712 37092 37037 2020-2021
## 119 <NA> <NA> <NA> <NA> <NA>
## 120 <NA> <NA> <NA> <NA> <NA>
## 121 <NA> <NA> <NA> <NA> <NA>
## 122 <NA> <NA> <NA> <NA> <NA>
## 123 <NA> <NA> <NA> <NA> <NA>
## 124 <NA> <NA> <NA> <NA> <NA>
## 125 <NA> <NA> <NA> <NA> <NA>
## 126 <NA> <NA> <NA> <NA> <NA>
## 127 <NA> <NA> <NA> <NA> <NA>
## 128 <NA> <NA> <NA> <NA> <NA>
## 129 <NA> <NA> <NA> <NA> <NA>
## 130 <NA> <NA> <NA> <NA> <NA>
## 131 <NA> <NA> <NA> <NA> <NA>
## 132 <NA> <NA> <NA> <NA> <NA>
## match_technical_skills match_soft_skills school_score
## 1 1 1 2
## 2 1 1 2
## 3 1 1 2
## 4 1 1 2
## 5 1 1 2
## 6 1 1 2
## 7 1 1 2
## 8 1 1 2
## 9 1 1 2
## 10 1 1 2
## 11 1 1 2
## 12 1 1 2
## 13 1 1 2
## 14 1 1 2
## 15 1 1 2
## 16 1 1 2
## 17 1 1 2
## 18 1 1 2
## 19 1 1 2
## 20 1 1 2
## 21 1 1 2
## 22 1 1 2
## 23 1 1 2
## 24 1 1 2
## 25 1 1 2
## 26 1 1 2
## 27 1 1 2
## 28 1 1 2
## 29 1 1 2
## 30 1 1 2
## 31 1 1 2
## 32 1 1 2
## 33 1 1 2
## 34 1 1 2
## 35 1 1 2
## 36 1 1 2
## 37 1 1 2
## 38 1 1 2
## 39 1 1 2
## 40 1 1 2
## 41 1 1 2
## 42 1 1 2
## 43 1 1 2
## 44 1 1 2
## 45 1 1 2
## 46 1 1 2
## 47 1 1 2
## 48 1 1 2
## 49 1 1 2
## 50 1 1 2
## 51 1 1 2
## 52 1 1 2
## 53 1 1 2
## 54 1 1 2
## 55 1 1 2
## 56 1 1 2
## 57 1 1 2
## 58 1 1 2
## 59 1 1 2
## 60 1 1 2
## 61 1 1 2
## 62 1 1 2
## 63 1 1 2
## 64 1 1 2
## 65 1 1 2
## 66 1 1 2
## 67 1 1 2
## 68 1 1 2
## 69 1 1 2
## 70 1 1 2
## 71 1 1 2
## 72 1 1 2
## 73 1 1 2
## 74 1 1 2
## 75 1 1 2
## 76 1 1 2
## 77 1 1 2
## 78 1 1 2
## 79 1 1 2
## 80 1 1 2
## 81 1 1 2
## 82 1 1 2
## 83 1 1 2
## 84 1 1 2
## 85 1 1 2
## 86 1 1 2
## 87 1 1 2
## 88 1 1 2
## 89 1 1 2
## 90 1 1 2
## 91 1 1 2
## 92 1 1 2
## 93 1 1 2
## 94 1 1 2
## 95 1 1 2
## 96 1 1 2
## 97 1 1 2
## 98 1 1 2
## 99 1 1 2
## 100 1 1 2
## 101 1 1 2
## 102 1 1 2
## 103 1 1 2
## 104 1 1 2
## 105 1 1 2
## 106 1 1 2
## 107 1 1 2
## 108 1 1 2
## 109 1 1 2
## 110 1 1 2
## 111 1 1 2
## 112 1 1 2
## 113 1 1 2
## 114 1 1 2
## 115 1 1 2
## 116 1 1 2
## 117 1 1 2
## 118 1 1 2
## 119 1 1 2
## 120 1 1 2
## 121 1 1 2
## 122 1 1 2
## 123 1 1 2
## 124 1 1 2
## 125 1 1 2
## 126 1 1 2
## 127 1 1 2
## 128 1 1 2
## 129 1 1 2
## 130 1 1 2
## 131 1 1 2
## 132 1 1 2
## good_data_science_program
## 1 YES
## 2 YES
## 3 YES
## 4 YES
## 5 YES
## 6 YES
## 7 YES
## 8 YES
## 9 YES
## 10 YES
## 11 YES
## 12 YES
## 13 YES
## 14 YES
## 15 YES
## 16 YES
## 17 YES
## 18 YES
## 19 YES
## 20 YES
## 21 YES
## 22 YES
## 23 YES
## 24 YES
## 25 YES
## 26 YES
## 27 YES
## 28 YES
## 29 YES
## 30 YES
## 31 YES
## 32 YES
## 33 YES
## 34 YES
## 35 YES
## 36 YES
## 37 YES
## 38 YES
## 39 YES
## 40 YES
## 41 YES
## 42 YES
## 43 YES
## 44 YES
## 45 YES
## 46 YES
## 47 YES
## 48 YES
## 49 YES
## 50 YES
## 51 YES
## 52 YES
## 53 YES
## 54 YES
## 55 YES
## 56 YES
## 57 YES
## 58 YES
## 59 YES
## 60 YES
## 61 YES
## 62 YES
## 63 YES
## 64 YES
## 65 YES
## 66 YES
## 67 YES
## 68 YES
## 69 YES
## 70 YES
## 71 YES
## 72 YES
## 73 YES
## 74 YES
## 75 YES
## 76 YES
## 77 YES
## 78 YES
## 79 YES
## 80 YES
## 81 YES
## 82 YES
## 83 YES
## 84 YES
## 85 YES
## 86 YES
## 87 YES
## 88 YES
## 89 YES
## 90 YES
## 91 YES
## 92 YES
## 93 YES
## 94 YES
## 95 YES
## 96 YES
## 97 YES
## 98 YES
## 99 YES
## 100 YES
## 101 YES
## 102 YES
## 103 YES
## 104 YES
## 105 YES
## 106 YES
## 107 YES
## 108 YES
## 109 YES
## 110 YES
## 111 YES
## 112 YES
## 113 YES
## 114 YES
## 115 YES
## 116 YES
## 117 YES
## 118 YES
## 119 YES
## 120 YES
## 121 YES
## 122 YES
## 123 YES
## 124 YES
## 125 YES
## 126 YES
## 127 YES
## 128 YES
## 129 YES
## 130 YES
## 131 YES
## 132 YES
# A visual of the database of recommended schools.
view(temp_schools)
# Create a label that encompasses multiple variables. Use the <p> html code to create a hard return and separate the City and State data.
temp_schools$label <- paste("<p><a>", temp_schools$NAME,"<p></a>",
temp_schools$CITY,",",
temp_schools$STATE)
# Create Leaflet map centered on the US eastern seaboard.
# The lapply function is used to interpret the <p> html code instead of literal text.
leaflet(temp_schools) %>%
addProviderTiles("CartoDB") %>%
setView(-80.95, 35.635, zoom = 4) %>%
addCircles(lat = ~ LAT, lng = ~ LON, label = lapply(temp_schools$label, HTML))
## Warning in validateCoords(lng, lat, funcName): Data contains 15 rows with either
## missing or invalid lat/lon values and will be ignored
By mining course descriptions words and joining them with vectors of desired skills, we successfully built a recommender system with a few key predictors. We extended the model by adding geocoding and mapping features to perform basic cluster analysis. From a visualize overview centered on the Eastern U.S. coastline, we can observe clustering of the schools predominently in the northeast: NYC Metro, Boston Metro and the Philadelphia Metro areas. Also North Carolina and Florida show significant clustering.
We can further extend the model and add other aspects into the model such as tuition costs, post graduate employment percentage and national university ranking.
Note, some colleges did not publish course descriptions so they were penalized by the recommender system.